Skip to content

Accept binary file-like objects in to_geotiff and the readers#1512

Merged
brendancol merged 2 commits intoxarray-contrib:mainfrom
brendancol:bytesio-source
May 8, 2026
Merged

Accept binary file-like objects in to_geotiff and the readers#1512
brendancol merged 2 commits intoxarray-contrib:mainfrom
brendancol:bytesio-source

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #1511.

What changed

  • New _BytesIOSource in _reader.py wraps any binary file-like (read+seek). A lock around seek+read keeps thread-pool windowed reads from racing on the buffer's cursor.
  • _open_source and read_to_array take either a string or a file-like.
  • _read_geo_info reads bytes directly from a buffer. We don't mmap.mmap arbitrary file-likes since they may not back a real fd.
  • open_geotiff takes a buffer. Raises ValueError for gpu=True or chunks=... with a buffer, since those paths re-open the source by path from worker tasks or device-side readers.
  • to_geotiff takes any path with a write method. Raises ValueError for cog=True + file-like (see Deferred). The .vrt branch is gated on isinstance(path, str) so a buffer can't accidentally land in the VRT path.
  • _write_bytes writes straight to the buffer when given a file-like. String paths keep the existing temp-file + os.replace atomic write.
  • Dask to_geotiff falls back from write_streaming to eager in-memory assembly for buffer destinations. The streaming writer patches IFD offsets in place and needs a real filesystem path.
  • _is_fsspec_uri in both modules now type-checks before string ops.

Deferred

  • cog=True to a file-like would need overview passes plus IFD patching against the buffer. Out of scope here.
  • VRT writes to a file-like: VRT is filesystem-only.
  • Streaming dask writes to a file-like: currently eager-materialises. Worth revisiting if anyone hits memory pressure on large dask -> buffer.

Tests

xrspatial/geotiff/tests/test_bytesio_source.py adds 7 cases: round-trip, uint8 round-trip, windowed read, cog=True reject, VRT-extension non-trigger, and concurrent reads from one source via a thread pool.

Full geotiff suite: 674 pass. The 3 remaining failures are pre-existing matplotlib palette tests on origin/main, unrelated to this PR.

Closes xarray-contrib#1511.

- New _BytesIOSource wraps any read+seek file-like; a lock around
  seek+read keeps thread-pool windowed reads race-free.
- _open_source, read_to_array, _read_geo_info, and open_geotiff
  accept either a string or a file-like.
- to_geotiff accepts any path with a write method; _write_bytes
  writes straight to the buffer for file-likes and keeps the
  temp-file + os.replace atomic write for string paths.
- Reject cog=True for file-likes (deferred), gpu=True / chunks
  for file-like sources, and gate VRT branches on isinstance str
  so buffers can't accidentally hit the VRT code path.
- Dask + file-like falls back to eager in-memory assembly since
  write_streaming patches IFD offsets in place on a temp path.

Tests: xrspatial/geotiff/tests/test_bytesio_source.py covers
round-trip, windowed read, COG/VRT rejection, and concurrent
reads from one source.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 7, 2026
@brendancol brendancol requested a review from Copilot May 7, 2026 18:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the GeoTIFF reader/writer APIs to accept in-memory/binary file-like objects (e.g., io.BytesIO) in addition to string paths, enabling fully in-memory read/write workflows and adding tests for round-trip and concurrent windowed reads.

Changes:

  • Added a file-like-backed reader source (_BytesIOSource) and updated reader entry points to accept str or binary file-like objects.
  • Updated open_geotiff/to_geotiff to validate unsupported combinations for file-like sources/destinations (e.g., dask/gpu reads, cog=True writes).
  • Added a dedicated test module covering BytesIO round-trips, windowed reads, and concurrent access.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File Description
xrspatial/geotiff/_reader.py Adds file-like detection and _BytesIOSource to support ranged reads from seekable buffers.
xrspatial/geotiff/_writer.py Extends _write_bytes to write to file-like destinations and hardens _is_fsspec_uri for non-strings.
xrspatial/geotiff/__init__.py Broadens public API to accept file-like inputs and adds early validation for unsupported modes.
xrspatial/geotiff/tests/test_bytesio_source.py Adds new tests for buffer-based read/write and concurrency behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread xrspatial/geotiff/tests/test_bytesio_source.py Outdated
Comment on lines +1510 to +1515
or any binary file-like object exposing ``write``."""
import os

# File-like destination: append the encoded bytes. The caller owns
# the buffer's lifetime (we don't close it).
if not isinstance(path, str) and hasattr(path, 'write'):
Comment thread xrspatial/geotiff/__init__.py
Comment thread xrspatial/geotiff/_reader.py Outdated
Comment on lines +383 to +384
self._size = fileobj.tell()
try:
Comment on lines +450 to +452
raise TypeError(
f"source must be a str path/URL or a binary file-like object "
f"with read+seek methods, got {type(source).__name__}")
Comment on lines +346 to 350
# VRT files (string paths only -- VRT XML references other files on disk)
if isinstance(source, str) and source.lower().endswith('.vrt'):
return read_vrt(source, dtype=dtype, window=window, band=band,
name=name, chunks=chunks, gpu=gpu,
max_pixels=max_pixels)
Comment on lines +404 to +406
if isinstance(source, str):
import os
name = os.path.splitext(os.path.basename(source))[0]
Comment on lines +727 to +730
elif not isinstance(path, str):
raise TypeError(
f"path must be a str or a binary file-like with a write() "
f"method, got {type(path).__name__}")
…runcate-on-rewrite, tell()

- `_coerce_path` normalises `os.PathLike` (e.g. `pathlib.Path`) to `str` at
  the top of every public reader/writer entry. Path('mosaic.vrt') now routes
  to read_vrt, Path('x.tif') derives a name, etc.
- `to_geotiff` rejects `gpu=True` with a file-like destination up front.
  The write_geotiff_gpu path was never tested with buffers and would have
  hit `_write_bytes(path)` without truncating.
- `_write_bytes` rewinds and truncates the buffer before writing when the
  destination supports it. Two writes to the same BytesIO now overwrite
  rather than concatenate, matching string-path semantics.
- `_is_file_like` now requires `tell` in addition to `read`/`seek`.
  `_BytesIOSource` calls `tell()` to size the buffer; the previous gate
  let read-seekable-but-not-tellable inputs through and crash inside
  `__init__`. We drop the guarded-tell pattern in the constructor in favour
  of a single try/except that raises a clear ValueError if the buffer is
  unusable (e.g. closed).
- Drop unused `threading` import from the test file.
- Tests cover Path round-trip, Path('.vrt') VRT routing, Path-derived name,
  GPU+buffer rejection, BytesIO overwrite-on-rewrite, and the tell()
  requirement at the gate.
@brendancol brendancol merged commit 4ada8f0 into xarray-contrib:main May 8, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Geotiff readers/writers reject file-like (BytesIO) sources despite advertising 'str | path-like'

2 participants